You are given an array of positive integers w where w[i] describes the weight of ith index (0-indexed).
We need to call the function pickIndex() which randomly returns an integer in the range [0, w.length - 1]. pickIndex() should return the integer proportional to its weight in the w array. For example, for w = [1, 3], the probability of picking the index 0 is 1 / (1 + 3) = 0.25 (i.e 25%) while the probability of picking the index 1 is 3 / (1 + 3) = 0.75 (i.e 75%).
More formally, the probability of picking index i is w[i] / sum(w).
Example 1:
Input ["Solution","pickIndex"] [[[1]],[]] Output [null,0] Explanation Solution solution = new Solution([1]); solution.pickIndex(); // return 0. Since there is only one single element on the array the only option is to return the first element.
Example 2:
Input ["Solution","pickIndex","pickIndex","pickIndex","pickIndex","pickIndex"] [[[1,3]],[],[],[],[],[]] Output [null,1,1,1,1,0] Explanation Solution solution = new Solution([1, 3]); solution.pickIndex(); // return 1. It's returning the second element (index = 1) that has probability of 3/4. solution.pickIndex(); // return 1 solution.pickIndex(); // return 1 solution.pickIndex(); // return 1 solution.pickIndex(); // return 0. It's returning the first element (index = 0) that has probability of 1/4. Since this is a randomization problem, multiple answers are allowed so the following outputs can be considered correct : [null,1,1,1,1,0] [null,1,1,1,1,1] [null,1,1,1,0,0] [null,1,1,1,0,1] [null,1,0,1,0,0] ...... and so on.
Constraints:
1 <= w.length <= 100001 <= w[i] <= 10^5pickIndexwill be called at most10000times.
Average Rating: 4.12 (107 votes)
Solution
Overview
This is actually a very practical problem which appears often in the scenario where we need to do sampling over a set of data.
Nowadays, people talk a lot about machine learning algorithms. As many would reckon, one of the basic operations involved in training a machine learning algorithm (e.g. Decision Tree) is to sample a batch of data and feed them into the model, rather than taking the entire data set. There are several rationales behind doing sampling over data, which we will not cover in detail, since it is not the focus of this article.
If one is interested, one can refer to our Explore card of Machine Learning 101 which gives an overview on the fundamental concepts of machine learning, as well as the Explore card of Decision Tree which explains in detail on how to construct a decision tree algorithm.
Now, given the above background, hopefully one is convinced that this is an interesting problem, and it is definitely worth solving.
Intuition
Given a list of positive values, we are asked to randomly pick up a value based on the weight of each value. To put it simple, the task is to do sampling with weight.
Let us look at a simple example. Given an input list of values [1, 9], when we pick up a number out of it, the chance is that 9 times out of 10 we should pick the number 9 as the answer.
In other words, the probability that a number got picked is proportional to the value of the number, with regards to the total sum of all numbers.
To understand the problem better, let us imagine that there is a line in the space, we then project each number into the line according to its value, i.e. a large number would occupy a broader range on the line compared to a small number. For example, the range for the number 9 should be exactly nine times as the range for the number 1.
Now, let us throw a ball randomly onto the line, then it is safe to say there is a good chance that the ball will fall into the range occupied by the number 9. In fact, if we repeat this experiment for a large number of times, then statistically speaking, 9 out of 10 times the ball will fall into the range for the number 9.
Voila. That is the intuition behind this problem.
Simulation
So to solve the problem, we can simply simulate the aforementioned experiment with a computer program.
First of all, let us construct the line in the experiment by chaining up all values together.
Let us denote a list of numbers as [w1,w2,w3,...,wn]. Starting from the beginning of the line, we then can represent the offsets for each range K as (∑1Kwi,∑1K+1wi), as shown in the following graph:
As many of you might recognize now, the offsets of the ranges are actually the prefix sums from a sequence of numbers. For each number in a sequence, its corresponding prefix sum, also known as cumulative sum, is the sum of all previous numbers in the sequence plus the number itself.
As an observation from the definition of prefix sums, one can see that the list of prefix sums would be strictly monotonically increasing, if all numbers are positive.
To throw a ball on the line is to find an offset to place the ball. Let us call this offset target.
Once we randomly generate the target offset, the task is now boiled down to finding the range that this target falls into.
Let us rephrase the problem now, given a list of offsets (i.e. prefix sums) and a target offset, our task is to fit the target offset into the list so that the ascending order is maintained.
Approach 1: Prefix Sums with Linear Search
Intuition
If one comes across this problem during an interview, one can consider the problem almost resolved, once one reduces the original problem down to the problem of inserting an element into a sorted list.
Concerning the above problem, arguably the most intuitive solution would be linear search. Many of you might have already thought one step ahead, by noticing that the input list is sorted, which is a sign to apply a more advanced search algorithm called binary search.
Let us do one thing at one time. In this approach, we will first focus on the linear search algorithm so that we could work out other implementation details. In the next approach, we will then improve upon this approach with a binary search algorithm.
So far, there is one little detail that we haven't discussed, which is how to randomly generate a target offset for the ball. By "randomly", we should ensure that each point on the line has an equal opportunity to be the target offset for the ball.
In most of the programming languages, we have some random() function that generates a random value between 0 and 1. We can scale up this randomly-generated value to the entire range of the line, by multiplying it with the size of the range. At the end, we could use this scaled random value as our target offset.
As an alternative solution, sometimes one might find a randomInteger(range) function that could generate a random integer from a given range.
One could then directly use the output of this function as our target offset.
Here, we adopt the random() function, since it could also work for the case where the weights are float values.
Algorithm
We now should have all the elements at hand for the implementation.
-
First of all, before picking an index, we should first set up the playground, by generating a list of prefix sums from a given list of numbers. The best place to do so would be in the constructor of the class, so that we don't have to generate it again and again at the invocation of
pickIndex()function.- In the constructor, we should also keep the total sum of the input numbers, so that later we could use this total sum to scale up the random number.
-
For the
pickIndex()function, here are the steps that we should perform.-
Firstly, we generate a random number between 0 and 1. We then scale up this number, which will serve as our
targetoffset. -
We then scan through the prefix sums that we generated before by linear search, to find the first prefix sum that is larger than our
targetoffset. -
And the index of this prefix sum would be exactly the right place that the target should fall into. We return the index as the result of
pickIndex()function.
-
Complexity Analysis
Let N be the length of the input list.
-
Time Complexity
-
For the constructor function, the time complexity would be O(N), which is due to the construction of the prefix sums.
-
For the
pickIndex()function, its time complexity would be O(N) as well, since we did a linear search on the prefix sums.
-
-
Space Complexity
-
For the constructor function, the space complexity would be O(N), which is again due to the construction of the prefix sums.
-
For the
pickIndex()function, its space complexity would be O(1), since it uses constant memory. Note, here we consider the prefix sums that it operates on, as the input of the function.
-
Approach 2: Prefix Sums with Binary Search
Intuition
As we promised before, we could improve the above approach by replacing the linear search with the binary search, which then can reduce the time complexity of the pickIndex() function from O(N) to O(logN).
As a reminder, the condition to apply binary search on a list is that the list should be sorted, either in ascending or descending order. For the list of prefix sums that we search on, this condition is guaranteed, as we discussed before.
Algorithm
We could base our implementation largely on the previous approach.
In fact, the only place we need to modify is the pickIndex() function, where we replace the linear search with the binary search.
As a reminder, there exist built-in functions of binary search in almost all programming languages. If one comes across this problem during the interview, it might be acceptable to use any of the built-in functions.
On the other hand, the interviewers might insist on implementing a binary search by hand. It would be good to prepare for this request as well.
There are several code patterns to implement a binary search algorithm, which we cover in the Explore card of Binary Search algorithm. One can refer to the card for more details.
Complexity Analysis
Let N be the length of the input list.
-
Time Complexity
-
For the constructor function, the time complexity would be O(N), which is due to the construction of the prefix sums.
-
For the
pickIndex()function, this time its time complexity would be O(logN), since we did a binary search on the prefix sums.
-
-
Space Complexity
-
For the constructor function, the space complexity remains O(N), which is again due to the construction of the prefix sums.
-
For the
pickIndex()function, its space complexity would be O(1), since it uses constant memory. Note, here we consider the prefix sums that it operates on, as the input of the function.
-
June 5, 2020 7:18 PM
hardest part of this question is figuring out what the question is about.
Last Edit: February 12, 2021 11:18 PM
Explanation of why prefixSum works:
Think that if we had an array [1,2,3,4,3]. Normal random pickIndex would pick any index from 0 to 4 with equal probability. But we want that index=1 is picked by 2/13 probability, index=0 with 1/13 probability and so on. (13 is sum of weights). To ensure that one way to think of it if we make a larger array (of size 13) where the values are the indices such that index i is repeated w[i] times then if we do a normal rand on this array then index 0 to 12 will be picked randomly with equal probability. 13 index array -> [0, 1,1, 2,2,2, 3,3,3,3, 4,4,4]. So there is a 3/13 chance of picking 2 as 2 is repeated thrice in the new array.
Now instead of actually constructing this 13 index array, we just know the range of the index of the 13 index array where value = i. Eg:
- for index=0, range is {0,0}
- index =1, range of indices of the new array is {1,2}
- index=2, range={3,5}
- index=3, range ={6,9}
- index = 4, range = {10,12}
In other words,
- index=0, range is <1
- index=1, range is <3
- index=2, range is <6
- index = 3, range is < 10
- index = 4, range is < 13
If you notice the above numbers 1,3,6,10,13 - they are cumulative sum.
The reason this happens is because for every range: right = left + (w[i] - 1) and left is (prev right+1). So if we substitute 2nd equation into 1st. right = (prev right)+w[i]; i.e. keep adding prev sum to current weight.
Thus the prefixSum is able to implement this.
Thanks for the explanation. I like the C++ example code which many other articles do not have. I feel like a second class citizen when I use C++ here :)
June 5, 2020 11:37 PM
It's the worst description I have ever seen in leetcode. Could anyone improve the description?
Please revise the question content, horrible writing and logic....
Great Article. I read the first part of the Article to understand the question. I wish this below sentence part of the question.
" Let us look at a simple example. Given an input list of values [1, 9], when we pick up a number out of it,
the chance is that 9 times out of 10 we should pick the number 9
as the answer."
You make medium questions sound like hard.
Team Leetcode! shame on you for such a bad phrasing of questions.
Thank you for the clarification, in the solution.
However, the question is not clear at all.
It accepts answer in a very specific way, that should be explained in the question with examples.
How the input given in the examples are applicable to this ?
June 5, 2020 1:06 PM
The method to get a random value in a range, the author use, is not uniform.
For more information you can search or read this: https://ericlippert.com/2013/12/16/how-much-bias-is-introduced-by-the-remainder-technique/
xxxxxxxxxxclass Solution {public: Solution(vector<int>& w) { } int pickIndex() { }};/** * Your Solution object will be instantiated and called as such: * Solution* obj = new Solution(w); * int param_1 = obj->pickIndex(); */
